{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Canned Estimators"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook we'll demonstrate how to use two Canned Estimators (these encapsulate the lower-level TensorFlow code we've seen so far, and use an API loosely inspired by [scikit-learn](scikit-learn.org). There are several advantages to Canned Estimators.\n",
    "\n",
    "* If you're using Estimators, you won't have to manage Sessions, or write your own logic for TensorBoard, or for saving and loading checkpoints.\n",
    "\n",
    "\n",
    "* You'll get out-of-the-box distributed training (of course, you will have to take care to read your data efficiently, and set up a cluster).\n",
    "\n",
    "Here, we'll read data using [input functions](https://www.tensorflow.org/api_docs/python/tf/estimator/inputs/numpy_input_fn), which are appropriate for in-memory data. \n",
    "\n",
    "* These provide batching and other features for you, so you don't have to write that code yourself.\n",
    "\n",
    "\n",
    "* In the strucuted data notebook, we'll use the new [Dataset API](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/docs_src/programmers_guide/datasets.md) - which is a popular abstraction, and a great way to efficiently read and pre-process large datasets efficiently.\n",
    "\n",
    "Although the Estimators we'll use here are relative simple (a LinearClassifier, and a Fully Connected Deep Neural Network), we also provide more interesting ones (including for [TensorFlow Wide and Deep](https://www.tensorflow.org/tutorials/wide_and_deep). I'm also excited that additional Estimators are on their way - stay tuned in the upcoming months.\n",
    "\n",
    "Also note that Estimators log quite a lot of output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from __future__ import absolute_import\n",
    "from __future__ import division\n",
    "from __future__ import print_function\n",
    "\n",
    "import numpy as np\n",
    "import tensorflow as tf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# We'll use Keras (included with TensorFlow) to import the data\n",
    "(x_train, y_train), (x_test, y_test) = tf.contrib.keras.datasets.mnist.load_data()\n",
    "\n",
    "x_train = x_train.astype('float32')\n",
    "x_test = x_test.astype('float32')\n",
    "\n",
    "y_train = y_train.astype('int32')\n",
    "y_test = y_test.astype('int32')\n",
    "\n",
    "# Normalize the color values to 0-1\n",
    "# (as imported, they're 0-255)\n",
    "x_train /= 255\n",
    "x_test /= 255\n",
    "\n",
    "print(x_train.shape[0], 'train samples')\n",
    "print(x_test.shape[0], 'test samples')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's our input function. \n",
    "\n",
    "* By setting ```num_epochs``` to ```None```, we'll loop over the data indefinitely so we can train for as long as we like.\n",
    "* The default ```batch_size``` is ```128```, but you can provide a different parameter if you like.\n",
    "\n",
    "You can read more about the numpy input function [here](https://www.tensorflow.org/api_docs/python/tf/estimator/inputs/numpy_input_fn). We also provide a nice one for [Pandas](https://www.tensorflow.org/api_docs/python/tf/estimator/inputs/pandas_input_fn), more on that later."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "train_input = tf.estimator.inputs.numpy_input_fn(\n",
    "    {'x': x_train},\n",
    "    y_train, \n",
    "    num_epochs=None, # repeat forever\n",
    "    shuffle=True # \n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "test_input = tf.estimator.inputs.numpy_input_fn(\n",
    "    {'x': x_test},\n",
    "    y_test,\n",
    "    num_epochs=1, # loop through the dataset once\n",
    "    shuffle=False # don't shuffle the test data\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# define the features for our model\n",
    "# the names must match the input function\n",
    "feature_spec = [tf.feature_column.numeric_column('x', shape=784)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, we'll create a ```LinearClassifier``` - this is identical to our Softmax (aka, multiclass logistic regression model) from the second notebok."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "estimator = tf.estimator.LinearClassifier(feature_spec, \n",
    "                                          n_classes=10,\n",
    "                                          model_dir=\"./graphs/canned/linear\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# I've arbitrarily decided to train for 1000 steps\n",
    "estimator.train(train_input, steps=1000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# We should see about 90% accuracy here.\n",
    "evaluation = estimator.evaluate(input_fn=test_input)\n",
    "print(evaluation)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's how you would print individual predictions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "MAX_TO_PRINT = 5\n",
    "\n",
    "# This returns a generator object\n",
    "predictions = estimator.predict(input_fn=test_input)\n",
    "i = 0\n",
    "for p in predictions:\n",
    "    true_label = y_test[i]\n",
    "    predicted_label = p['class_ids'][0]\n",
    "    print(\"Example %d. True: %d, Predicted: %d\" % (i, true_label, predicted_label))\n",
    "    i += 1\n",
    "    if i == MAX_TO_PRINT: break"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's how easy it is to switch the model to a fully connected DNN."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "estimator = tf.estimator.DNNClassifier(\n",
    "    hidden_units=[256], # we will arbitrarily use two layers\n",
    "    feature_columns=feature_spec,\n",
    "    n_classes=10,\n",
    "    model_dir=\"./graphs/canned/deep\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# I've arbitrarily decided to train for 2000 steps\n",
    "estimator.train(train_input, steps=2000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Expect accuracy around 97%\n",
    "evaluation = estimator.evaluate(input_fn=test_input)\n",
    "print(evaluation)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you like, you can compare these runs with TensorBoard.\n",
    "\n",
    "``` $ tensorboard --logdir=graphs/canned/ ```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
